The genome sequence of the cut surfclam, Spisula subtruncata (da Costa, 1778)

We present a genome assembly from a specimen of Spisula subtruncata (the cut surfclam; Mollusca; Bivalvia; Venerida; Mactridae). The genome sequence is 930.8 megabases in span. Most of the assembly is scaffolded into 19 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 19.64 kilobases in length.


Background
Surf clams (Mactridae) are commonly eaten worldwide and are an important fisheries resource (Degraer et al., 2007;Fahy et al., 2003;Kuykendall et al., 2017).Spisula subtruncata is one of three British Spisula species and is found in silty muddy sand from the low intertidal and shallow shelf depths around the UK. S. subtruncata is a filter feeder, preferring silty or muddy sands and has a northeast Atlantic distribution which extends from Norway and south to Spain continuing into the Mediterranean (GBIF Secretariat, 2024).
Spisula subtruncata has a thick, solid shell but the outline is variable, and two forms exist.One is very fat, squat with large, swollen umbones and a sculpture of heavy concentric lines, the anterior dorsal margin is shorter than the posterior dorsal and the posterior margin is subtruncate.The other form is more elongated, has finer concentric lines but also has a subtruncate posterior margin.In both forms the pallial sinus is short, moderately curved and points towards the anterior margin.The shell is white or cream and is covered with a thin, pale brown periostracum covering it, which wears off in patches (Degraer et al., 2007).
S. subtruncata can be confused with a non-native species that has been discovered in the UK -Mulinia lateralis.This American species was first discovered in Europe in 2017 and has recently been discovered in the UK (Holmes et al., 2023).The non-native has a distinct radial ridge on the posterior margin, enabling a distinction between the two species.
Here we present a chromosomal-level whole genome sequence for Spisula subtruncata, based on a specimen from Plymouth Sound, Devon, UK.

Genome sequence report
The genome was sequenced from a specimen of Spisula subtruncata (Figure 1) collected from Drakes Island East, Plymouth Sound, Devon, UK (50.35,.A total of 31-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 98 missing joins or mis-joins and removed 73 haplotypic duplications, reducing the assembly length by 3.71% and the scaffold number by 29.63%, and decreasing the scaffold N50 by 2.99%. The final assembly has a total length of 930.8 Mb in 151 sequence scaffolds with a scaffold N50 of 48.3 Mb (Table 1).The snail plot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.
The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.25%) of the assembly sequence was assigned to 19 chromosomal-level scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.
The estimated Quality Value (QV) of the final assembly is 67.8 with k-mer completeness of 100.0%, and the assembly has a BUSCO v completeness of 80.0% (single = 78.5%,duplicated = 1.5%), using the mollusca_odb10 reference set (n = 5,295).

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences Sequel IIe (HiFi) and Illumina NovaSeq 6000 (RNA-Seq) instruments.Hi-C data were also generated from tissue of xbSpiSubt5 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.The sanger-tol/blobtoolkit pipeline is a Nextflow port of the previous Snakemake Blobtoolkit pipeline (Challis et al., 2020).It aligns the PacBio reads with SAMtools and minimap2 (Li, 2018) and generates coverage tracks for regions of fixed size.In parallel, it queries the GoaT database (Challis et al., 2023) to identify all matching BUSCO lineages to run BUSCO (Manni et al., 2021).For the three domain-level BUSCO lineage, the pipeline aligns the  et al., 1990).
All those outputs are combined with the blobtools suite into a blobdir for visualisation.
All three pipelines were developed using the nf-core tooling (Ewels et al., 2020), use MultiQC (Ewels et al., 2016), and make extensive use of the Conda package manager, the Bioconda initiative (Grüning et al., 2018)   Table 3 contains a list of relevant software tool versions and sources.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.

Version
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.
The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Zhihua Lin
Zhejiang Wanli University, Ningbo, Zhejiang, China Yongbo Bao Zhejiang Wanli University, Ningbo, Zhejiang, China This Data note provides a necessary introduction to the species cut surfclam, Spisula subtruncata.
The methods for genome sequencing and analysis are reasonable, and the description is detailed.
The data have been registered in the European Nucleotide Archive, making it easily accessible to readers.Uploading the data to NCBI simultaneously can allow more readers to access it.In conclusion, I believe this paper can be indexed.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.The assembled genome demonstrates high-quality continuity; however, its completeness appears to be somewhat limited, with a BUSCO completeness score of 80%.This is relatively low compared to similar genomes produced using the same methodology and falls below the desired standards.Nonetheless, this genome represents a valuable addition to the databases for an underrepresented species and deserves recognition in this manuscript.
The lower completeness score might be attributed to challenges associated with repetitive DNA sequences, as Spisula subtruncata is known to contain a significant amount of satellite DNA, predominantly within the heterochromatic regions of its chromosomes.This newly assembled genome will facilitate further investigations by other researchers into these repetitive elements and their genomic implications.Reviewer Expertise: Genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Spisula subtruncata, xbSpiSubt1.1:metrics.The BlobToolKit snail plot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 930,862,144 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (75,865,956 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (48,288,179 and 34,259,483 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the mollusca_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Spisula_subtruncata/dataset/GCA_963678985.1/snail.

Figure 3 .
Figure 3. Genome assembly of Spisula subtruncata, xbSpiSubt1.1:BlobToolKit GC-coverage plot.Sequences are coloured by phylum.Circles are sized in proportion to sequence length.Histograms show the distribution of sequence length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Spisula_subtruncata/dataset/GCA_963678985.1/blob.

Figure 4 .
Figure 4. Genome assembly of Spisula subtruncata, xbSpiSubt1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all sequences.Coloured lines show cumulative lengths of sequences assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Spisula_subtruncata/dataset/GCA_963678985.1/cumulative.
BUSCO genes to the Uniprot Reference Proteomes database(Bateman et al., 2023) withDIAMOND (Buchfink et al., 2021)   blastp.The genome is also split into chunks according to the density of the BUSCO genes from the closest taxonomically lineage, and each chunk is aligned to the Uniprot Reference Proteomes database with DIAMOND blastx.Genome sequences that have no hit are then chunked with seqtk and aligned to the NT database with blastn (Altschul

Figure 5 .
Figure 5. Genome assembly of Spisula subtruncata, xbSpiSubt1.1:Hi-C contact map of the xbSpiSubt1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=NB6vXH6NQ-m3FVQdho4_Xg.

Reviewer Report 15
July 2024 https://doi.org/10.21956/wellcomeopenres.24563.r87885© 2024 Garcia-Souto D. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Daniel Garcia-Souto University of Santiago de Compostela, Santiago de Compostela, Galicia, Spain Adkins et al. present the first comprehensive description of the genome of the surfclam Spisula subtruncata.

References 1 .
García-Souto D, Mravinac B, Šatović E, Plohl M, et al.: Methylation profile of a satellite DNA constituting the intercalary G+C-rich heterochromatin of the cut trough shell Spisula subtruncata (Bivalvia, Mactridae).Sci Rep. 2017; 7 (1): 6930 PubMed Abstract | Publisher Full Text Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.Reviewer Expertise: Molecular cytogenetics, malacology, genomics, repetitive DNA I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.1. Introduction: The authors mention two forms of S. subtruncata.They should explain whether these forms have any genetic differences and specify which form was sequenced.2. Figure 1: Include a scale bar in the specimen photo.If a scale bar is unavailable, state the adult size in the figure caption.3. BUSCO Score: The BUSCO score appears low.Can the authors provide potential reasons, such as the nature of the genome, assembly/sequencing issues, or other factors?Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.

Table 1 . Genome data for Spisula subtruncata, xbSpiSubt1.1. Project accession data
* Assembly metric benchmarks are adapted from column VGP-2020 of "Table1:Proposed standards and metrics for defining genome assembly quality" from Rhie et al. (2021).**BUSCOscoresbased on the mollusca_odb10 BUSCO set using version v5.4.3.C = complete [S = single copy, D = duplicated], F = fragmented, M = missing, n = number of orthologues in comparison.A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/Spisula_subtruncata/dataset/GCA_963678985.1/busco.The workflow for high molecular weight (HMW) DNA extraction at the Wellcome Sanger Institute (WSI) Tree of Life Core Laboratory includes a sequence of core procedures: sample preparation; sample homogenisation, DNA extraction, fragmentation, and clean-up.In sample preparation at the WSI Tree of Life Core Laboratory, the xbSpiSubt1 sample was weighed and dissected on dry ice (Jay et al., 2023).Somatic tissue was homogenised using a PowerMasher II tissue disruptor (Denton et al., 2023a).HMW DNA was extracted in the WSI Scientific Operations core using the Automated MagAttract v2 protocol(Oatley  et al., 2023).The DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.Protocols developed by the WSI Tree of Life laboratory are publicly available on protocols.io(Dentonetal., 2023b).
RNA was extracted from tissue of xbSpiSubt10 in the Tree of Life Laboratory at the WSI using the RNA Extraction: Automated MagMax™ mirVana protocol(do Amaral et al.,  2023).The RNA concentration was assessed using a Nanodrop spectrophotometer and a Qubit Fluorometer using the Qubit RNA Broad-Range Assay kit.